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Abstract. We derive the distribution of the eigenvalues of a large sample covariance matrix when 
the data is dependent in time. More precisely, the dependence for each variable i = X,...,p is 

modelled as a linear process (Xi t t)t=l,...,n = (Ejio c i ^i,t—j)t—l n> where {Zi^t} are assumed 

to be independent random variables with finite fourth moments. If the sample size n and the 
number of variables p = p n both converge to infinity such that y = lim n _>oo n/p n > 0, then 
the empirical spectral distribution of p~ 'XX T converges to a non-random distribution which 
only depends on y and the spectral density of (X% t)tez,- In particular, our results apply to 
(fractionally integrated) ARMA processes, which we illustrate by some examples. 



1. Introduction and main result 

A typical object of interest in many fields is the sample covariance matrix (n — 1) _1 XX T of 
a data matrix X = (Xij)u, i = 1, ■ ■ ■ ,P, t = 1, . . . , n. The matrix X can be seen as a sample of 
size n of p-dimensional data vectors. For fixed p one can show, as n tends to infinity, that under 
certain assumptions the eigenvalues of the sample covariance matrix converge to the eigenvalues of 
the true underlying covariance matrix [2]. However, the assumption p <C n may not be justified if 
one has to deal with high dimensional data sets, so that it is often more suitable to assume that 
the dimension p is of the same order as the sample size n, that is p — p n — > oo such that 

71 

(1.1) lim - =: y € (0,oo). 

n— >oo p 

For a symmetric matrix A with eigenvalues Aj., . . . , A p , we denote by 

1 p 

y i=l 

the spectral distribution of A, where 5 X denotes the Dirac measure located at x. This means 
that pF A (B) is equal to the number of eigenvalues of A that lie in the set B. From now on we 
will call p _1 XX T the sample covariance matrix. Due to Eq. (1.1), this change of normalization 
can be reversed by a simple transformation of the limiting spectral distribution. For notational 
convenience we suppress the explicit dependence of the occurring matrices on n and p where this 
does not cause ambiguity. 

The distribution of Gaussian sample covariance matrices of fixed size was first computed in [20] . 
Several years later, it was Marchenko and Pastur [14] who considered the case where the random 
variables {X i t } are more general i.i. d. random variables with finite second moments EX^ = 1, 
and the number p of variables is of the same order as the sample size n. They showed that the 
empirical spectral distribution (ESD) F p xx of p _1 XX T converges, asn-> oo, to a non-random 
distribution F, called limiting spectral distribution (LSD), given by 

(1.2) F(dx) = ^:V( X + - x )( x ~ x-)l{ x _^ x < ix+ }dx 7 

and point mass F({0}) = 1 — y if y < 1; in this formula, x± = (1 ± ^Jy) 2 ■ Here and in the following, 
convergence of the ESD means almost sure convergence as a random element of the space of 
probability measures on R equipped with the weak topology. In particular, the eigenvalues of the 
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sample covariance matrix of a matrix with independent entries do not converge to the eigenvalues 
of the true covariance matrix, which is the identity matrix and therefore only has eigenvalue one. 
This leads to the failure of statistics that rely on the eigenvalues of p _1 XX T which have been 
derived under the assumption of fixed p, and random matrix theory is a tool to correct these 
statistics [4, 13]. In the case where the true covariance matrix is not the identity matrix, the LSD 
can in general only be given in terms of a non-linear equation for its Sticltjes transform, which is 
defined by 

mp[z) = / T— dF Vz £ C + := {z = u + iv £ C : 3z = v > 0}. 
J A — z 

Conversely, the distribution F can be obtained from its Stieltjes transform mp via the Stieltjes- 
Perron inversion formula ([3, Theorem B.8]), which states that 

1 f b 

(1.3) F([a,b]) = - lim / SmJi + ie)dx. 

* J a 

for all continuity points a < b of F. For a comprehensive account of random matrix theory we 
refer the reader to [1, 3, 15], and the references therein. 

Our aim in this paper is to obtain a Marchenko-Pastur type result in the case where there is 
dependence within the rows of X. More precisely, for i = 1, .. . ,p, the ith row of X is given by a 
linear process of the form 

/ oo 

(-X»,t)t=l,...,T> = ( ^""^ C jZj,t-j 

/ t=l,...,n 

Here, (Z itt ) it is an array of independent random variables that satisfies 

(1.4) EZ i>t = 0, EZf t = 1, and cr 4 := supEZf t < oo, 

i,t 

as well as the Lindebcrg-type condition that, for each e > 0, 
^ p n 

(1.5) — jE 8 ^ 1 ^,?"}) -*°> as n ^°°- 

Clearly, Eq. (1.5) is satisfied if all {Zi, t } arc identically distributed. 

The novelty of our result is that we allow for dependence within the rows, and that the equation 
for mp is given in terms of the spectral density 

f{w)=Y,lih)e- ihw , we [0,24 

hez 

of the linear processes X, only, which is the Fourier transform of the autocovariance function 

oo 

l(h) = X) c J c i+l h l' h e Z - 

Potential applications arise whenever data is not independent in time such that the Marchenko- 
Pastur law is not a good approximation. This includes e. g. wireless communications [ ] and 
mathematical finance [18, 17]. Note that a similar question is also discussed in [5]. However, they 
have a different proof which relies on a moment condition to be verified. Furthermore, they assume 
that the random variables {^i,t} are identically distributed so that the processes within the rows 
are independent copies of each other. More importantly, their results do not yield concrete formulas 
except in the AR(1) case and are therefore not directly applicable. In the context of free probability 
theory, the limiting spectral distribution of large sample covariance matrices of Gaussian ARMA 
processes is investigated in [7]. 

Before we present the main result of this article, we explain the notation used in this article. The 
symbols Z, N K, and C denote the sets of integers, natural, real, and complex numbers, respectively. 
For a matrix A, we write A T for its transpose and tr^4 for its trace. Finally, the indicator of an 
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expression £ is denoted by I^gy and denned to be one if £ is true, and zero otherwise; for a set S, 
we also write Is( x ) instead of I^ xe gj. 

Theorem 1.1. For each i = 1, . . . ,p, let X it — Xj*lo c j^i,t—ji t ^> be a linear stochastic process 
with continuously differentiable spectral density f . Assume that 
i) the array (Zij)u satisfies conditions (1.4) and (1.5), 

ii) there exist positive constants C and 8 such that \cj\ ^ C(j + 1) _1_<5 for all j ^ 0, 
Hi) for almost all AeR, /(w) = A for at most finitely many uj S [0, 2ii\, and 
iv) f'(u)) 7^ for almost every to. 

Then the empirical spectral distribution F p xx o/p _1 XX T converges, as n tends to infinity, 
almost surely to a non-random probability distribution F with bounded support. Moreover, there 
exist positive numbers A_,A + such that the Stieltjes transform z i— > mp(z) of F is the unique 
mapping C + — > C + satisfying 

(1-6) A = -4/A>r^ T. n^A. 



mp{z) 2nJ l + \m p (z) /-* ' J/'(w)| 

The assumptions of the theorem are met, for instance, if (^i,t)t is an ARMA or fractionally 
integrated ARMA process; see Section 3 for details. 

Theorem 1.1, as it stands, does not contain the classical Marchenko-Pastur law as a special case. 
For if the entries X it of the matrix X are i. i. d., the corresponding spectral density / is identically 
equal to the variance of X\ t \, and thus condition iv) is not satisfied. We therefore also present a 
version of Theorem 1.1 that holds if the rows of the matrix X have a piecewise constant spectral 
density. 

Theorem 1.2. For each i = 1, . . . ,p, let X it — Xj*lo c j^i,t—ji ^ ^> be a ^ nefflr stochastic process 
with spectral density f of the form 

k 

(1.7) /:[0,27r]->R+ w ^ ^ a d l Aj (w), fc e N, 

i=i 

/or some positive real numbers ctj and a measurable partition A\ U • • • U Afe o/ £/ie interval [0, 27r] . 
If conditions i) and ii) of Theorem 1.1 hold, then the empirical spectral distribution F p xx 
o/p _1 XX T converges, as n — > oo, almost surely to a non-random probability distribution F 
with bounded support. Moreover, the Stieltjes transform z t— > mp(z) of F is the unique mapping 
C + — > C + that satisfies 

mp(z) 2ir ^— J 1 + ajmp(z) 

where \Aj\ denotes the Lebesgue measure of the set Aj. In particular, if the entries o/X are i. i. d. 
with unit variance, one recovers the limiting spectral distribution (1.2) of the Marchenko-Pastur 
law. 

Remark 1.3. In applications one often considers processes of the form X;. t = /x + c jZi,t-j 
with mean /i ^ 0. If we denote by Xt € W the tth column of the matrix X, and define the 
empirical mean by x = p^ 1 Y^t=i x t> then the sample covariance matrix is given by the expression 
P 1 J2t=i( x t ~ x )( x t — X ) T instead of p _1 XX T . However, by [3, Theorem A. 44], the subtraction 
of the empirical mean does not change the LSD, and thus Theorems 1.1 and 1.2 remain valid if the 
underlying linear process has a non-zero mean. 

Remark 1.4. The proof of Theorems 1.1 and 1.2 can easily be generalized to cover non-causal linear 
processes, which are defined as X i t — X^-oo c jZi,t-j- For this case one obtains the same result 
except that the auto covariance function is now given by X^-oo c j c j+\h\- 

Remark 1.5. If one considers a matrix X which has independent linear processes in its columns 
instead of its rows, one obtains the same formulas as in Theorems 1.1 and 1.2 except that y is 
replaced by y" 1 . This is due to the fact that X T X and XX T have the same non-trivial eigenvalues. 
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In Section 2 we proceed with the proofs of Theorems 1.1 and 1.2. Thereafter we present some 
interesting examples in Section 3. 

2. Proofs 

In this section we present our proofs of Theorems 1.1 and 1.2. Dealing with infinite-order moving 
average processes directly is dfficult, and we therefore first prove a variant of these theorems for the 
truncated processes X i t = J2]=o c j^i-t~j- We define the p x n matrix X = (X it ) it , i = 1, . . . ,p, 
t = 1, . . . , n. 

Theorem 2.1. Under the assumptions of Theorem 1.1 (Theorem 1.2), the empirical spectral 
distribution of the sample covariance matrix of the truncated process X converges, as n tends to 
infinity, to a deterministic distribution with bounded support. Its Stieltjes transform is uniquely 
determined by Eq. (1.6) (Eq. (1.8)). 

Proof. The proof starts from the observation that one can write X = ZH, where M. px2n 9 Z = 
[Zi,t)it, i = 1, • • • ,P, t = 1 - n, . . . ,7i, and 

I c„ c n _i ... a c ... \ 



(2.1) H = 



c„ ... c 2 ci c 



\ ... C n Cn-x c J 



In particular, XX T = ZHH T Z T . In order to prove convergence of the empirical spectral dis- 
tribution F p xx and to obtain a characterization of the limiting distribution, it suffices, by 
[16, Theorem 1], to prove that the spectral distribution F HH of HH T converges to a non-trivial 
limiting distribution. This will be done in Lemma 2.2, where the LSD of HH T is shown to be 
F HH = ~d + \F T ; the distribution F T is computed in Lemma 2.3 if we impose the assumptions 
of Theorem 1.1, respectively in Lemma 2.4 if we impose the assumptions of Theorem 1.2. Inserting 
this expression for F HH into equation (1.2) of [ ] shows that the ESD F p xx converges, as 
n — y oo , almost surely to a deterministic distribution, which is determined by the requirement that 
its Stieltjes transform z H ► m(z) satisfies 

(2-2) J r = - z + 2 y f X+ f^dF"^ = z + y f + — f^dF r . 

m[z) Jx_ 1 + Ato(z) J x 1 + Xm(z) 

Using the explicit formulas of F r computed in Lemmas 2.3 and 2.4, one obtains Eqs. (1.6) and (1.8). 
Uniqueness of a mapping m : C + — > C + solving Eq. (2.2) was shown in [3, p. 88]. We complete 
the proof by arguing that the LSD of p _1 XX T has bounded support. For this it is enough, by 
[3, Theorem 6.3], to show that the spectral norm of HH T is bounded in n, which is also done in 
Lemma 2.2. □ 

Lemma 2.2. Let H = (c n _i + jl{o^ri-i+j^n})ij be the matrix appearing in Eq. (2.1), and assume 
that there exist positive constants C, S such that \cj\ ^ C(j + l) _1_<5 (assumption ii) of Theorem 1.1). 
Then the spectral norm of the matrix HH T is bounded in n. If, moreover, the spectral distribution 
of the Toeplitz matrix T — (7(1 — j))ij converges weakly to some limiting distribution F r , then the 
spectral distribution F HH converges weakly, as n —¥ 00, to ^Sq + ^F r . 

Proof. We first introduce the notation H ■= HH T 6 R 2nx2n as well as the block decomposition 
H ~ 

mT 11 1 G M™ x ". We prove the second part of the lemma first. There are several 
ri 12 tiii J 

ways to show that the spectral distributions of two sequences of matrices converge to the same 
limit. In our case it is convenient to use [3, Corollary A. 41] which states that two sequences A n 
and B n , either of whose empirical spectral distribution converges, have the same limiting spectral 
distribution if n^ 1 tr(A n — B n )(A n — B n ) T converges to zero as n tends to infinity. We shall 
employ this result twice: first to show that the LSDs of %= HH T and H ■= diag(0,'H22) agree, 



H 
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and then to prove equality of the LSDs of H22 and T. Let A-^ = n -1 tr(H — H)(T-L — H) T ; a 
direct calculation shows that A-^ = ra -1 [tr'Hn'H^ + 2tr"Hi2"Hf 2 ] i and we will consider each of 
the two terms in turn. From the definition of H it follows that the (i, j)th entry of % is given by 
c n —i+kCyi~j+k1-{-m&-K (i n^fe^min (i ■ The trace of the square of the upper left block 
of T~L therefore satisfies 

n n min(i,j) 

1 2 



tr 



«n^x=E W = E 



*j=i 



min 
k=l 



^ X! i c i+fc-iii c J+fc-iii c i+'-iii c J+'-ii 

i ,j,k,l— 1 
n+1 

sCC 4 £ ^-1-^.-1-^-1-^-1-5 
<[CC(1 + ^)] 4 <oo, 

where ((z) denotes the Riemann zeta function. As a consequence, the limit of n -1 trWu'H^ as n 
tends to infinity is zero. Similarly, we obtain for the trace of the square of the off-diagonal block of 
H the bound 

2 

n n+i i 

1 2 



n In 



trHi^f 2 = £ E £ 

2 — 1 ^' = n+l i— 1 J=7l-f 1 



^ ^ Cn—i-\~kC-n—j-\-k 
k—j — n 
n n n— z+1 n— i+1 

^EE E E c l +fc-i c fc-j c l +j-ici-j 

i=l j = l fe=j Z=j 



l||Cr||C s +j-l||C g | 



n n n n 

^EEEEi Ci +'-+j- 

i— 1 j — 1 r—0 s—0 
n+1 

<[C((i + S)} 4 < 00, 

which shows that the limit of n _1 tr'H^'H^ is zero. It follows that A^, as defined in Lemma 2.2, 
converges to zero as n goes to infinity, and therefore that the LSDs of T-L and T-L = diag(0,%22) 
coincide. The latter distribution is clearly given by F n — |<5o + \F U22 , and we show next that the 
LSD of 7^22 agrees with the LSD of T — (7(1 — As before it suffices to show, by [3, Corollary 

A. 41], that Ap = n" 1 tr("H 22 — r)(% 2 2 — T) T converges to zero as n tends to infinity. It follows 
from the definitions of T-L and F that nAr can be estimated as 

2 



lA r = E 



E °k-iCk-j ~ £ c fe-i c fe+K-j|-i 

k— max 



k = l 



= £ E c k-iCk~j - £ °k-iCk-j 

k— max A;— max 

n 00 

= ^ Ck+i-lC-k+j-lCl+i-lC-l+j-l 
i,j=l k,l=l 

sec 4 J2 jr i- 1 -^- 1 -^- 1 - 5 !- 1 - 5 < [cca + st < 00. 

i,j=2 k,l=2 

Consequently, A r converges to zero as n goes to infinity, and it follows that F n = \5 a + \F T . 
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In order to show that the spectral norm of H = HH T is bounded in n, we use Gerschgorin's 
circle theorem ([8, Theorem 2]), which states that every eigenvalue of T-L lies in at least one of the 
balls B(W l ,Ri) with centre W and radius Ri, i — 1, . . . , 2n, where the radii Ri are defined as 
Ri — ^2j^i \ H^\. We first note that the centres W % satisfy 

min{z,n} n 

H u = £ c 2 n _ l+k [CC(2 + 2S)f < oo. 

k — max {1,*— n } k—Q 

To obtain a uniform bound for the radii Ri we first assume that i — 1, . . . , n. Then 

n min{i,j} 2n i 

1^1 S l c n-i+fe||Cn-j+fc| + l^-i+fcll^-J+fcl 

7 — 1 fe=l j— n+1 k—j — n 

n 2n—i n—j 

< ^ |c„-. 1+fc ||c J+fc -i|+ ^ ^|c fc+J ||c fc K2[CC(l + ^)] 2 <oo. 

i,fe=l j— n+l — i fc— 

Similarly we find that, for i = n + 1, . . . , 2n, 

n J 2n n 

j=l k—i—n j— n+l fe= max{i,j} — n 

i— 1 n+l— j 2n n— max{i,j'} 

^E E E E ic fc iic fc+ | i _ i ,i<3[cc(i+*)] a 

j—i—n k—0 j—n+1 k—Q 

is bounded, which completes the proof. □ 

In the following two lemmas, we argue that the distribution F r exists and we prove explicit 
formulas for it in the case that the assumptions of Theorem 1.1 or Theorem 1.2 are satisfied. 

Lemma 2.3. Let [cj)j be a sequence of real numbers, 7 : h 1— > Y^jLo c j c j+\ h \ > an d f '• w ^ 
Sftez l{h)e~ lhul . Under the assumptions of Theorem 1.1 it holds that the spectral distribution F r 
o/r = (7(2 — converges weakly, as n — > 00, to an absolutely continuous distribution F T with 

bounded support and density 

(2-3) 5 :(A_,A + )^M+, X^±- £ ^ 

w:/(w)=A 

Proof. We first note that under assumption ii) of Theorem 1.1 the autocovariance function 7 is 
absolutely summable because 

00 00 00 00 

EItWKEENI^K 03 E ^ 1 ^r 1 - 5 <[cc(i + <f<^. 

h=0 h=Oj=Q h,j = l 

Szego's first convergence theorem ([11] and [10, Corollary 4.1]) then implies that F T exists, and 
that the cumulative distribution function of the eigenvalues of the Toeplitz matrix T associated 
with the sequence h 1— > 7(/i) is given by 

1 f 2rr .1 



(2.4) G(A) := — J l {/(wK A}du; = — Leb({w e [0, 2tt] : /(w) < A}), 

for all A such that the level sets {cj € [0, 2ir] : f(u)) = A} have Lebesgue measure zero. By 
assumption iii) of Theorem 1.1, Eq. (2.4) holds for almost all A. In order to prove that the LSD 
F r is absolutely continuous with respect to the Lebesgue measure, it suffices to prove that the 
cumulative distribution function G is differentiable almost everywhere. Clearly, for AA > 0, 

G(A + AA) - G(A) = — Leb({w € [0, 2tt] : A < f(u) < A + AA}). 
2n 

Due to assumption iv) of Theorem 1.1, the set of all A E E such that the set {u :€ [0, 2tt] : f(oj) = 
A and /'(w) = 0} is non-empty is a Lebesgue null-set. Hence it is enough to consider only A for 
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which this set is empty. Let / _1 (A) = {ui : f(uj) — A} be the pre-image of A, which is a finite set 
by assumption iii). The implicit function theorem then asserts that, for every w € / _1 (A), there 
exists an open interval / w around w such that / restricted to I u is invertible. It is no restriction 
to assume that these 1^ are disjoint. By choosing A A sufficiently small it can be ensured that 
the interval [A, AA] is contained in n^e/- 1 ^) f(Jv)i ancl from the continuity of / it follows that 
outside of LLe/-i(A) -^>> the values of / are bounded away from A, so that 

Km — [G(A + AA) - G(A)1 

AA^O AA 

= ^ A lim o ^Leb ( [J {u' G I w : A < /(a/) < A + AA} 

~ E J^^Leb^'e^ : A </(./) <A + AA}). 

In order to further simplify this expression, we denote the local inverse functions by f^ 1 : f{I^) — > 
[0, 2ir]. Observing that the Lebesgue measure of an interval is given by its length, and that the 
derivatives of f~ x are given by the inverse of the derivative of /, it follows that 

lim 4r [G(X + AA) - G(A)1 V lim 4x |/" 1 (A + AA) - /jVA)! 

d 



- E 

w£/-i(A) 

1 E 1 



dA /-(A) 



we/ 2 (A) 

This shows that G is differentiable almost everywhere with derivative j : A H ^ |/'Lj)| • 

It remains to argue that the support of F r is bounded. The absolute summability of 7(-) implies 
boundedness of its Fourier transform /. The claim then follows from Eq. (2.4), which shows that 
the support of g is equal to the range of /. □ 

Lemma 2.4. Let f : ui t- > 53j=i a jlA, (w) be the piecewise constant spectral density of the linear 

-\oo 



process X t = Y]j—Q CjZt-ii an d denote the corresponding autocovariance function by 7 : h n- 

=0 C j C j + \h\ 



c j c j+\h\ ■ Under the assumptions of Theorem 1.2 it holds that the spectral distribution F T of 



r = (7(1 — converges weakly, asn-> 00, to the distribution F r = (2n) 1 Y]j—i \ Aj\S a:j . 

Proof. Without loss of generality we may assume that < ot\ < . . . < a%. As in the proof of 
Lemma 2.3 one sees that F T exists, and that F r (— 00, A) is given by 

k 

G(A) := ±- Leb({w e [0, 2rr] : f(u) < A}), VA € [0, 2rr]\ Q {a,}. 

3=1 

The special structure of / thus implies that G(A) = (2tt)~ 1 Ylj=i where k\ is the largest 
integer such that a^ x A. Since G must be right-continuous, this formula holds for all A in the 
interval [0, 2tt], It is easy to see that the function G is the cumulative distribution function of the 
discrete measure (2tt)~ 1 Xw=i IAjI^p which completes the proof. □ 

Proof, of Theorems 1.1 and 1.2 It is only left to show that the truncation performed in Theorem 2.1 
does not alter the LSD, i. e. that the difference of F p xx and F p xx converges to zero almost 
surely. By [3, Corollary A. 42], this means that we have to show that 

(2.5) \ tr(XX T + XX T ) \ tr((X - X)(X - X) T ) 

pi pz 
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converges to zero. To this end we show that I has a limit, and that II converges to zero, both 
almost surely. By the definition of X and X we have 



y Tt' OO ■^^j 

II= ^EE E X c kCmZ t: t- k z t . 



i—l t— 1 k—n+1 m— n+1 

We shall prove that the variances of II are summable. For this purpose we need the following two 
estimates which are implied by the Cauchy-Schwarz inequality, the assumption that 174 = sup.^ t EZf t 
is finite, and the assumed absolute summability of the coefficients (cj)j: 

p n 00 / 00 \ ^ 

(2.6a) E XX X \ c kCmZi,t-kZi, t -m\ ^ pn |c fe | < 00, 

7=1 t = l fc, 777=1 \fc=l / 

p n 00 

(2.6b) E X] X! X {ckCmCk'Cm'Ziit-kZit-mZi'J'-k'Zi'J'-m'l 

i,i'=l=l t,t' = l fc,fc',m,m'-l 

Therefore we can, by Fubini's theorem, interchange expectation and summation to bound the 
variance of II as 

_^ p n 00 

Var(II) ^ ^ ^ ^2 CkCmCk'Cm'EiZt.t-kZi.t-rnZi'j'-k'Zi'j'-m')- 
P t,i' = \t,t' = \ k,k' 

Considering separately the terms where i = i' and i 7^ i' ', we can write 

^ P n \ 

Var(II) XX X CkCmCk'Cm'E^ij-kZij-mZi'j'-k'Zi^t'-m') 

= l t,t' = l k,k' 
. , ,=n+l 
1^% m,m 



P n 

H 4 c /c c mCfc'C ?Ti 'E(Z ijt _! c Z ijt _,, ; Z, -Z, ./-_,„ ' 



P 

i=l t,t' = l k,k' 

771,777 

For the expectation in the first sum not to be zero, k must equal m and k' must equal ml 1 in which 
case its value is unity. The expectation in the second term can always be bounded by (74, so that 
we obtain 

F \fc = 77+l / ^ \fc=77 + l / 

Due to Eq. (1.1) and the assumed polynomial decay of Ck there exists a constant K such that the 
right hand side is bounded by KnT 1 ^^ , which implies that 

00 00 

Var (II) K n^ iS < 00, 
77=1 77=1 

and therefore, by the first Borel-Cantelli lemma, that II converges to a constant almost surely. In 
order to show that this constant is zero, it suffices to shows that the expectation of II converges 
to zero. Since EZ i t = 0, and the {Z i t \ are independent, one sees, using Eq. (2.6a) and again 
Fubini's theorem, that E(II) = np _1 Y^k=n+i c fc' which converges to zero because the {ck} are 
square-summable. 

We now consider factor I of expression (2.5) and define Ax = XX T — XX T . Then 
(2.7) 1= ^tr(Ax)+2^tr(XX T ). 
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Because of 

n 

(XX T )jj = y^Xf t = Ck c mZi,t-kZi y t- 



n oo oo 



t=l t=l fc=0m=0 

V^ 71 



and similarly (XX T ) lt = J2t=i Sfe=o ELo c fc c m Z M _ fc Z M _ m , we have that 
tr(Ajr) = £ [(XX T ) lt - (XX T ) 



i=l 
p n oo 



i—l i— 1 fc— n+1 m— n+1 



=II->0 a.s. 

71 CX) 73 



(2-8) ^ c k c mZi,t-kZj,t-m- 

i—1 t—1 k— n+1 m=l 

Equation (2.6b) allows us to apply Fubini's theorem to compute the variance of the second term in 
the previous display as 

. p n \ n 

— l t^t' — l k,k' — n+1 m,m' — 1 

which is, by the same reasoning as we did for II, bounded by 

/oo \ 2 / " \ 2 

P \k=n+l / \m=l / 

for some positive constant K. Clearly, this is summable in n. Having, by Eq. (2.6a), expected 
value zero, the second term of Eq. (2.8) and, therefore, also tr(Ax) both converge to zero almost 
surely. Thus, we only have to look at the contribution of lb in expression (2.7). From Theorem 2.1 
we know that F p xx converges almost surely weakly to some non-random distribution F with 
bounded support. Hence, denoting by Ai, . . . , \ p the eigenvalues of p _1 XX T , 

Ih = p tr (p^ T ) = pE Al = / AdFp xx % J XdF < oo, 

almost surely. It follows that, in Eq. (2.5), factor I is bounded, and factor II converges to zero, and 
so the proof of Theorems 1.1 and 1.2 is complete. □ 



3. ILLUSTRATIVE EXAMPLES 

For several classes of widely employed linear processes, Theorem 1.1 can be used to obtain 
an explicit description of the limiting spectral distribution. In this section we consider the class 
of autoregressive moving average (ARMA) processes as well as fractionally integrated ARMA 
models. The distributions we obtain in the case of AR(1) and MA(1) processes can be interpreted 
as one-parameter deformations of the classical Marchenko-Pastur law. 

3.1. Autoregressive moving average processes. Given polynomials a : z h-> 1 + a\z + . . . a p z p 

and b '. z i — y 1 ~\~ b\z + . . . + b q z q , an ARMA(p,q) process X with autoregressive polynomial a 
and moving average polynomial b is defined as the stationary solution to the stochastic difference 
equation 

X t + a\X t ^i + . . . + a p X t - p = Z t + biZ t -i + . ■ ■ + b q Z t - qi teZ. 
If the zeros of a lie outside the closed unit disk, it is well known that X has an infinite-order 
moving average representation X t — Y^jLo^^*-^ wnere { c j} are the coefficients in the power 
series expansion of b(z)/a(z) around zero. It is also known ([6]) that there exist positive constants 
p < 1 and K such that \cj\ ^ Kp>, so that assumption ii) of Theorem 1.1 is satisfied. While the 
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XXX 



(a) y = 1 (b) y = 3 (c) j/ = 5 

FIGURE 1. Limiting spectral densities A i— > p(X) of p _1 XX T for the MA(1) process 
X t = Z t + -&Z t -\ for different values of $ and y — n/p 




(a) y = 1 (b) j/ = 3 (c) y = 5 



Figure 2. Limiting spectral densities A i-> p(A) of p 1 XX T for the AR(1) process 
X t = tpX t -\ + Z t for different values of tp and y = n/p 



autocovariance function of a general ARMA process does not in general have a simple closed form, 
its Fourier transform is given by 

2 



(3.1) 



Me" 



a (e iw ) 



u e [0, 2tt] 



Since / is rational, assumptions iii) and iv) of Theorem 1.1 are satisfied as well. In order to compute 
the LSD of r, it is necessary, by Lemma 2.3, to find the roots of a trigonometric polynomial of 
possibly high degree, which can be done numerically. 

We now consider the special case of the ARMA(1,1) process X t = (pX t -i + Z t + ftZt-i, \ip\ < 1, 
for which one can obtain explicit results. By Eq. (3.1), the spectral density of X is given by 



1 + tf 2 + 2t?cosw 



oj e [0,2tt]. 



1 + ip 2 — 2ip cos uj ' 

Equation (2.3) implies that the LSD of the autocovariance matrix L has a density g, which is given 
by 



<?(A) 



E 



2tt ^ 



where 



7T(^ + <p\)y/[(l + I?) 2 - A(l - yf\ [A(l + UJ) 2 - (1 - l?) 2 j 

(l±tf) 2 



(A), 



A_ 



fin (A ,A + ), A-| 



:(A-,A+), A d 



(1T^) 2 ' 
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X X 



(a) y = 1 (b) y = 3 (c) y = 5 

FIGURE 3. Histograms of the eigenvalues and limiting spectral densities A H> p(A) 
of p^XX 7 for the ARMA(1,1) process X f = \X t -\ + Z t + Z t -i for different 
values of y = n/p, p = 1000 



By Theorem 1.1, the Stieltjes transform z i— > m z of the limiting spectral distribution of p 'XX T 
is the unique mapping m : C + — > C + that satisfies the equation 

1 f x + Xg{X) 
— — z + y \ dA 



(3.2) =-z + 



m z J \ 1 + Am 



•dm z — ip 



(■d + <p){l + tiip)y 



(•dm z - (p)y/[(l - p) 2 + m z {l + d) 2 } [(1 + p) 2 + m z (l - t?) 2 ] ' 

This is a quartic equation in m z = m(z) which can be solved explicitly. An application of the 
Stieltjes inversion formula (1.3) then yields the limiting spectral distribution of p _1 XX T . 

If one sets (p — 0, one obtains an MA(1) process; plots of the densities obtained in this case 
for different values of •& and y are displayed in Fig. 1. Similarly, the case $ = corresponds to an 
AR(1) process; see Fig. 2 for a graphical representation of the densities one obtains for different 
values of if and y in this case. For the special case (p= 1/2, i? = 1, Fig. 3 compares the histogram 
of the eigenvalues of p _1 XX T with the limiting spectral distribution obtained from Theorem 1.1 
for different values of y. 

Equation (3.2) for the Stieltjes transform of the limiting spectral distribution of the sample 
covariance matrix of an ARMA(1,1) process should be compared to [5, Eq. (2.10)], where the 
analogous result is obtained for an autoregressive process of order one. They use the notation 
c = limp/n and consider the spectral distribution of n _1 XX T instead of p _1 XX T . If one observes 
that this difference in the normalization amounts to a linear transformation of the corresponding 
Stieltjes transform, one obtains their result as a special case of Eq. (3.2). 

3.2. Fractionally integrated ARMA processes. In many practical situations, data exhibit 
long-range dependence, which can be modelled by long-memory processes. Denote by B the 
backshift operator and define, for d > — 1, the (fractional) difference operator by 

v d =(i-B) d =£nHr^ BJ < Bix * =x *-i' 

j=0 k=l 

A process (A t ) t is called a fractionally integrated ARMA(p,d,q) processes with d e (—1/2, 1/2) 
and p, q € N if (X/ d X t )t is an ARMA(p,q) process. These processes have a polynomially decaying 
autocorrelation function and therefore exhibit long-range-dependence, cf. [(i, Theorem 13.2.2] and 
[9, 12]. We assume that d < 0, and that the zeros of the autoregressive polynomial a of (V d X t ) t 
lie outside the closed unit disk. Then it follows that X has an infinite-order moving average 
representation X t — J2'jLo c j^t-j, where the (cj)j have, in contrast to our previous examples, 
not an exponential decay, but satisfy Ki(j + l) d_1 ^ Cj ^ K 2 (j + l) d_1 , for some Ki,K 2 > 0. 
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Therefore, if d < 0, one can apply Theorem 1.1 to obtain the LSD of the sample covariance matrix, 
using that the spectral density of (X t )t is given by 
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